Team Members

4/4/19

Maria Nakhoul, Eliana Marostica, and Sunny Mahesh

Data Set

4/11/19

We will be using 3 data sets freely available online from the CDC

setwd("../")
iddr <- read_csv("data/Impaired_Driving_Death_Rate__by_Age_and_Gender__2012___2014__All_States.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   Location = col_character(),
##   `All Ages, 2012` = col_double(),
##   `All Ages, 2014` = col_double(),
##   `Ages 0-20, 2012` = col_double(),
##   `Ages 0-20, 2014` = col_double(),
##   `Ages 21-34, 2012` = col_double(),
##   `Ages 21-34, 2014` = col_double(),
##   `Ages 35+, 2012` = col_double(),
##   `Ages 35+, 2014` = col_double(),
##   `Male, 2012` = col_double(),
##   `Male, 2014` = col_double(),
##   `Female, 2012` = col_double(),
##   `Female, 2014` = col_double()
## )
ocdr <- read_csv("data/Motor_Vehicle_Occupant_Death_Rate__by_Age_and_Gender__2012___2014__All_States.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   Location = col_character(),
##   `All Ages, 2012` = col_double(),
##   `All Ages, 2014` = col_double(),
##   `Age 0-20, 2012` = col_double(),
##   `Age 0-20, 2014` = col_double(),
##   `Age 21-34, 2012` = col_double(),
##   `Age 21-34, 2014` = col_double(),
##   `Age 35-54, 2012` = col_double(),
##   `Age 35-54, 2014` = col_double(),
##   `Age 55+, 2012` = col_double(),
##   `Age 55+, 2014` = col_double(),
##   `Male, 2012` = col_double(),
##   `Male, 2014` = col_double(),
##   `Female, 2012` = col_double(),
##   `Female, 2014` = col_double()
## )
sebt <- read_csv("data/Percentage_of_Drivers_and_Front_Seat_Passengers_Wearing_Seat_Belts__2012___2014__All_States.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   `2012` = col_double(),
##   `2014` = col_double(),
##   Location = col_character()
## )

Impaired Driving Death Rates by Age and Gender (2012 & 2014)

Summarize the variables, data types (temporal, networks, multivariate matrices, etc.), and key statistics (# of elements, # of attributes, # of timepoints, etc.) of your data set.

The key variables here are state, location (of the state), age group (all,0-20, 21-34, 35+), year (2012,2014), sex, and death rate. The data type here is numeric - these are death rates. There are two timepoints: 2012 and 2014. These death rates are available for each of the 50 states plus Washington D.C. and the United Sates as a whole.

iddr %>%
  summary()
##     State             Location         All Ages, 2012   All Ages, 2014 
##  Length:52          Length:52          Min.   : 1.200   Min.   :1.600  
##  Class :character   Class :character   1st Qu.: 2.500   1st Qu.:2.500  
##  Mode  :character   Mode  :character   Median : 3.600   Median :3.200  
##                                        Mean   : 3.916   Mean   :3.624  
##                                        3rd Qu.: 4.800   3rd Qu.:4.600  
##                                        Max.   :11.300   Max.   :8.200  
##                                        NA's   :2        NA's   :3      
##  Ages 0-20, 2012 Ages 0-20, 2014 Ages 21-34, 2012 Ages 21-34, 2014
##  Min.   :0.600   Min.   :0.700   Min.   : 3.300   Min.   : 3.000  
##  1st Qu.:1.200   1st Qu.:0.850   1st Qu.: 5.800   1st Qu.: 5.200  
##  Median :1.600   Median :1.200   Median : 6.850   Median : 6.300  
##  Mean   :1.621   Mean   :1.411   Mean   : 7.971   Mean   : 7.173  
##  3rd Qu.:1.950   3rd Qu.:1.950   3rd Qu.: 9.400   3rd Qu.: 8.200  
##  Max.   :2.800   Max.   :2.800   Max.   :21.400   Max.   :15.400  
##  NA's   :28      NA's   :34      NA's   :10       NA's   :11      
##  Ages 35+, 2012   Ages 35+, 2014    Male, 2012       Male, 2014    
##  Min.   : 1.400   Min.   :1.500   Min.   : 1.700   Min.   : 2.400  
##  1st Qu.: 2.200   1st Qu.:2.350   1st Qu.: 4.100   1st Qu.: 3.700  
##  Median : 3.200   Median :3.100   Median : 5.700   Median : 4.900  
##  Mean   : 3.585   Mean   :3.537   Mean   : 6.271   Mean   : 5.658  
##  3rd Qu.: 4.375   3rd Qu.:4.475   3rd Qu.: 7.525   3rd Qu.: 7.125  
##  Max.   :12.000   Max.   :8.100   Max.   :17.400   Max.   :13.100  
##  NA's   :6        NA's   :6       NA's   :4        NA's   :4       
##   Female, 2012    Female, 2014
##  Min.   :0.700   Min.   :0.8  
##  1st Qu.:1.150   1st Qu.:1.1  
##  Median :1.500   Median :1.3  
##  Mean   :1.663   Mean   :1.6  
##  3rd Qu.:1.850   3rd Qu.:1.8  
##  Max.   :4.000   Max.   :4.3  
##  NA's   :17      NA's   :17

Motor Vehicle Occupant Death Rates by Age and Gender (2012 & 2014)

ocdr %>%
  summary()
##     State             Location         All Ages, 2012   All Ages, 2014  
##  Length:53          Length:53          Min.   : 2.900   Min.   : 2.300  
##  Class :character   Class :character   1st Qu.: 5.350   1st Qu.: 5.250  
##  Mode  :character   Mode  :character   Median : 7.400   Median : 6.800  
##                                        Mean   : 8.516   Mean   : 8.104  
##                                        3rd Qu.:11.050   3rd Qu.:11.100  
##                                        Max.   :20.200   Max.   :21.000  
##                                        NA's   :2        NA's   :2       
##  Age 0-20, 2012   Age 0-20, 2014   Age 21-34, 2012 Age 21-34, 2014
##  Min.   : 1.700   Min.   : 1.200   Min.   : 4.10   Min.   : 3.90  
##  1st Qu.: 3.400   1st Qu.: 3.100   1st Qu.: 8.85   1st Qu.: 8.40  
##  Median : 4.800   Median : 3.700   Median :12.00   Median :10.90  
##  Mean   : 5.114   Mean   : 4.466   Mean   :13.64   Mean   :12.67  
##  3rd Qu.: 6.650   3rd Qu.: 5.800   3rd Qu.:17.95   3rd Qu.:16.20  
##  Max.   :11.000   Max.   :10.200   Max.   :29.60   Max.   :37.50  
##  NA's   :10       NA's   :12       NA's   :6       NA's   :6      
##  Age 35-54, 2012  Age 35-54, 2014  Age 55+, 2012    Age 55+, 2014   
##  Min.   : 2.200   Min.   : 2.600   Min.   : 3.900   Min.   : 3.500  
##  1st Qu.: 4.800   1st Qu.: 5.600   1st Qu.: 6.975   1st Qu.: 6.700  
##  Median : 7.600   Median : 8.100   Median : 9.200   Median : 8.750  
##  Mean   : 8.922   Mean   : 8.624   Mean   : 9.800   Mean   : 9.393  
##  3rd Qu.:12.300   3rd Qu.:11.600   3rd Qu.:12.025   3rd Qu.:11.425  
##  Max.   :27.000   Max.   :21.000   Max.   :20.700   Max.   :19.700  
##  NA's   :8        NA's   :8        NA's   :7        NA's   :7       
##    Male, 2012      Male, 2014     Female, 2012     Female, 2014   
##  Min.   : 4.10   Min.   : 3.80   Min.   : 1.700   Min.   : 1.500  
##  1st Qu.: 6.40   1st Qu.: 6.90   1st Qu.: 4.175   1st Qu.: 4.100  
##  Median :10.20   Median : 9.70   Median : 5.400   Median : 5.000  
##  Mean   :11.35   Mean   :11.09   Mean   : 5.852   Mean   : 5.702  
##  3rd Qu.:15.10   3rd Qu.:15.00   3rd Qu.: 6.950   3rd Qu.: 7.400  
##  Max.   :29.30   Max.   :27.70   Max.   :12.900   Max.   :14.300  
##  NA's   :2       NA's   :4       NA's   :5        NA's   :6

There are two time points for the OCDR data 2012 and 2014. There are 2 main attributes sex and age. There are 52 elements (50 states, DC, and USA as a whole). All of the data is numeric data, where the values are the number of deaths for that specific category.

Percentage of Drivers and Front Seat PAssengers Wearing Seat Belts

sebt %>%
  summary()
##     State                2012            2014         Location        
##  Length:51          Min.   :67.00   Min.   :69.00   Length:51         
##  Class :character   1st Qu.:80.00   1st Qu.:82.50   Class :character  
##  Mode  :character   Median :84.00   Median :87.00   Mode  :character  
##                     Mean   :85.18   Mean   :86.57                     
##                     3rd Qu.:91.00   3rd Qu.:92.00                     
##                     Max.   :97.00   Max.   :98.00
indicies_max_2012<-which(sebt$`2012`==max(sebt$`2012`))
sebt[indicies_max_2012,]
## # A tibble: 2 x 4
##   State      `2012` `2014` Location                              
##   <chr>       <dbl>  <dbl> <chr>                                 
## 1 Oregon         97     98 "Oregon\n(44.567912, -120.156945)"    
## 2 Washington     97     95 "Washington\n(47.517368, -120.467672)"
indicies_min_2012<-which(sebt$`2012`==min(sebt$`2012`))
sebt[indicies_min_2012,]
## # A tibble: 1 x 4
##   State        `2012` `2014` Location                               
##   <chr>         <dbl>  <dbl> <chr>                                  
## 1 South Dakota     67     69 "South Dakota\n(44.35371, -100.373709)"
indicies_max_2014<-which(sebt$`2014`==max(sebt$`2014`))
sebt[indicies_max_2014,]
## # A tibble: 1 x 4
##   State  `2012` `2014` Location                          
##   <chr>   <dbl>  <dbl> <chr>                             
## 1 Oregon     97     98 "Oregon\n(44.567912, -120.156945)"
indicies_min_2014<-which(sebt$`2014`==min(sebt$`2014`))
sebt[indicies_min_2014,]
## # A tibble: 1 x 4
##   State        `2012` `2014` Location                               
##   <chr>         <dbl>  <dbl> <chr>                                  
## 1 South Dakota     67     69 "South Dakota\n(44.35371, -100.373709)"

In 2012, the state with the maximum number of drivers and front seat passangers wearing a seat belt are Washington and Oregon, while the state with the minimum number is South Dakota.

In 2014, the state with the maximum number of drivers and front seat passangers wearing a seat belt is Oregon, while the state with the minimum number is South Dakota.

Visualization Tasks and Requirements

4/18/19

  1. Describe what kind of information can be derived through exploratory visualization analysis of the data set. The information we can derive from our dataset are: a. Geographical data b. Temporal data c. Age stratification d. Gender stratification e. Correlations
  2. Identify the target audience for the visualization tool that you will build. Target audience: a. Insurance Companies b. CDC c. DMV d. General Audience e. Policy Makers
  3. Develop a list of visualization tasks for the data set. Visualizations: a. Correlation Heatmap b. US Choropleth Map by state by year or age c. US Choropleth Map by designated regions (ie: Midwest, west, etc) d. Boxplots to show statistics when you click on a state e. Hover over maps and show data

Sketches and 5 Design Sheets

4/25/19

Many of our sketches were done on whiteboards, but we have copies of our 5 design sheets below.

5 Design Sheets #1

5 Design Sheets #1

5 Design Sheets #2

5 Design Sheets #2

5 Design Sheets #3

5 Design Sheets #3

5 Design Sheets #4

5 Design Sheets #4

5 Design Sheets #5

5 Design Sheets #5

*Edit 5/9/19: The 5 design sheets were made prior to the addition of the temporal dataset.

Updates

4/29/19

We have now added temporal data to our datasets. An updated, comprehensive explanation of our datasets is below.

The data we are using for this dataset consists of several parts. All the parts fall under motor vehicle induced deaths, for 2 different sources. The first source is the CDC. The data obtained from the CDC are 3 datasets, Impaired Driving Death Rate by Age and Gender 2012 & 2014 All States, Passengers Wearing Seat Belts 2012 & 2014 All States, Motor Vehicle Occupant Death Rate by Age and Gender 2012 & 2014 All States. Datasets are divided up by all the states and death rates divided by age in the following categories: All Ages, Age 0-20, Age 21-34, Age 35-54, Age 55+, as well as by gender, Male or Female, for the years 2012 and 2014. Impaired driving death rates contains death rate data calculated per state per 100,00 population of individuals who had BAC =>0.08%1. Passengers wearing seat belts contained the percentage of seat belt wearing individuals. The data was collected from the National Occupant Protection Use Survey (NOPUS)2. Motor vehicle occupant death rate contained death rage by age or gender per 100,000 population. Data was collected by Fatality Analysis Reporting System (FARS) in 2012 and by National Highway Traffic Safety Administration’s (NHTSA) and Fatality Analysis Reporting System (FARS)3.

Since we didn’t have enough temporal data, we used another source to obtain more data which was, the Insurance Institute for Highway Safety Highway Loss Data Institute. We were able to use this data since it was collected from the same source as the data in the CDC datasets, FARS4. From this website we collected the fatal crash totals which contained all states, number of deaths, and deaths per 100,000 population. The temporal problem was fixed because we were able to obtain the same information for the years 2005 to 2017. Deaths by road user which broke down the total deaths in the previous dataset into the motor vehicle crash death per state, and if the occupant was in a car, pickup and SUV, large truck, motorcyclist, pedestrian, or a bicyclist. Restrain use was also obtained in order to get the percentage of observed seat belt use per state.

Our previous Visualization Tasks and Requirements:

  1. Describe what kind of information can be derived through exploratory visualization analysis of the data set. The information we can derive from our dataset are: a. Geographical data b. Temporal data c. Age stratification d. Gender stratification e. Correlations
  2. Identify the target audience for the visualization tool that you will build. Target audience: a. Insurance Companies b. CDC c. DMV d. General Audience e. Policy Makers

The first and second requirement points would stay the same. The data is of interest to the same audience, and the same types of information are derived from the 2 datasets since they were collected by the same source (FARS) initially. Our third requirement has changed though due to more time points in our data.

  1. Develop a list of visualization tasks for the data set. Visualizations: a. Correlation Scatter Plot b. US Choropleth Map by state by year or age c. US Choropleth Map similarity between all the years and enlarge a specific year to look at d. Boxplots to show statistics when you click on a state e. Hover over maps and show data f. Line graphs to show trends over time. g. Small Multiples

New Data Summary Statistics

#install.packages("rio")
library(rio)
setwd("../")
fatal_car_crashes<-import_list("data/fatal car_crash.xlsx",setclass = "tbl",rbind = TRUE)
deaths_by_road_users<-import_list("data/Deaths by road users.xlsx",setclass="tbl",rbind=T)
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
## New names:
## * `Car occupants ` -> `Car occupants ..2`
## * `Car occupants ` -> `Car occupants ..3`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..4`
## * `Pickup and SUV occupants` -> `Pickup and SUV occupants..5`
## * `Large truck occupants ` -> `Large truck occupants ..6`
## * … and 9 more
indicies=c()

for( i in 1:13){
  indicies[i]=52*i
}
deaths_car_crashes=fatal_car_crashes[-indicies,]

line_graph<-fatal_car_crashes[indicies,]
colnames(line_graph)=c("State","Population","Deaths","Deaths_per_100000_Population","Year")
line_graph$Year=c("2017","2016","2015","2014","2013","2012","2011","2010","2009","2008","2007","2006","2005")

na_indicies=which(is.na(deaths_by_road_users$State))
deaths_by_road_users=deaths_by_road_users[-na_indicies,]
road_users_deaths=deaths_by_road_users[-indicies,]

year=c(rep(2017,51),rep(2016,51),rep(2015,51),rep(2014,51),rep(2013,51),rep(2012,51),rep(2011,51),rep(2010,51),rep(2009,51),rep(2008,51),rep(2007,51),rep(2006,51),rep(2005,51))

deaths_car_crashes=deaths_car_crashes[,1:4]
deaths_car_crashes$Year=year

road_users_deaths_column_names=c("State","Car_Occupant_Death_Number","Car_Occupant_Death_Percent","Pickup_and_SUV_Occupant_Death_Number","Pickup_and_SUV_Occupant_Death_Percent","Large_Truck_Occupant_Death_Number","Large_Truck_Occupant_Death_Percent","Motorcyclists_Occupant_Death_Number","Motorcyclists_Occupant_Death_Percent","Pedestrians_Occupant_Death_Number","Pedestrians_Occupant_Death_Percent","Bicyclists_Occupant_Death_Number","Bicyclists_Occupant_Death_Percent","Total_Occupant_Death_Number","Total_Occupant_Death_Percent","Year")
colnames(road_users_deaths)=road_users_deaths_column_names
road_users_deaths$Year=year
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Car_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 California   1954
## 2 Texas        1375
## 3 Texas        1238
## 4 California    994
## 5 Florida       931
## 6 Florida       924
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Pickup_and_SUV_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 Texas        1195
## 2 Texas         975
## 3 California    870
## 4 Florida       760
## 5 California    642
## 6 Florida       611
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Large_Truck_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 Texas          93
## 2 Texas          81
## 3 California     47
## 4 Florida        46
## 5 Georgia        44
## 6 California     42
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Motorcyclists_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 Florida       590
## 2 California    548
## 3 Florida       546
## 4 California    535
## 5 Texas         512
## 6 Texas         490
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Pedestrians_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 California    867
## 2 California    742
## 3 Texas         672
## 4 Florida       654
## 5 Florida       576
## 6 Texas         419
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Bicyclists_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 Florida       107
## 2 California     99
## 3 California     99
## 4 Florida        83
## 5 Texas          65
## 6 New York       57
head(road_users_deaths%>%group_by(State)%>%summarize(n=as.integer(max(Total_Occupant_Death_Number)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 California   4329
## 2 Texas        3776
## 3 California   3623
## 4 Florida      3543
## 5 Texas        3504
## 6 Florida      3174
summary(road_users_deaths)
##     State           Car_Occupant_Death_Number Car_Occupant_Death_Percent
##  Length:663         Length:663                Length:663                
##  Class :character   Class :character          Class :character          
##  Mode  :character   Mode  :character          Mode  :character          
##                                                                         
##                                                                         
##                                                                         
##  Pickup_and_SUV_Occupant_Death_Number
##  Length:663                          
##  Class :character                    
##  Mode  :character                    
##                                      
##                                      
##                                      
##  Pickup_and_SUV_Occupant_Death_Percent Large_Truck_Occupant_Death_Number
##  Length:663                            Length:663                       
##  Class :character                      Class :character                 
##  Mode  :character                      Mode  :character                 
##                                                                         
##                                                                         
##                                                                         
##  Large_Truck_Occupant_Death_Percent Motorcyclists_Occupant_Death_Number
##  Length:663                         Length:663                         
##  Class :character                   Class :character                   
##  Mode  :character                   Mode  :character                   
##                                                                        
##                                                                        
##                                                                        
##  Motorcyclists_Occupant_Death_Percent Pedestrians_Occupant_Death_Number
##  Length:663                           Length:663                       
##  Class :character                     Class :character                 
##  Mode  :character                     Mode  :character                 
##                                                                        
##                                                                        
##                                                                        
##  Pedestrians_Occupant_Death_Percent Bicyclists_Occupant_Death_Number
##  Length:663                         Length:663                      
##  Class :character                   Class :character                
##  Mode  :character                   Mode  :character                
##                                                                     
##                                                                     
##                                                                     
##  Bicyclists_Occupant_Death_Percent Total_Occupant_Death_Number
##  Length:663                        Length:663                 
##  Class :character                  Class :character           
##  Mode  :character                  Mode  :character           
##                                                               
##                                                               
##                                                               
##  Total_Occupant_Death_Percent      Year     
##  Length:663                   Min.   :2005  
##  Class :character             1st Qu.:2008  
##  Mode  :character             Median :2011  
##                               Mean   :2011  
##                               3rd Qu.:2014  
##                               Max.   :2017

Car Occupant Death Number seems to show the most deaths over the years contribute largely from Texas, California and Florida.

colnames(deaths_car_crashes)=c("State","Population","Deaths","Deaths_per_100000_Population","Year")
head(deaths_car_crashes%>%group_by(State)%>%summarize(n=as.integer(max(Deaths_per_100000_Population)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 Wyoming        37
## 2 Mississippi    31
## 3 Montana        28
## 4 Wyoming        27
## 5 Alabama        26
## 6 New Mexico     25
head(deaths_car_crashes%>%group_by(State)%>%summarize(n=as.integer(max(Deaths)))%>%arrange(desc(n)))
## # A tibble: 6 x 2
##   State           n
##   <chr>       <int>
## 1 California   4329
## 2 Texas        3776
## 3 California   3623
## 4 Florida      3543
## 5 Texas        3504
## 6 Florida      3174
summary(deaths_car_crashes)
##     State             Population           Deaths      
##  Length:663         Min.   :  509294   Min.   :  15.0  
##  Class :character   1st Qu.: 1644697   1st Qu.: 223.0  
##  Mode  :character   Median : 4293204   Median : 505.0  
##                     Mean   : 6105580   Mean   : 712.1  
##                     3rd Qu.: 6829052   3rd Qu.: 937.5  
##                     Max.   :39536653   Max.   :4329.0  
##  Deaths_per_100000_Population      Year     
##  Min.   : 2.40                Min.   :2005  
##  1st Qu.: 9.00                1st Qu.:2008  
##  Median :12.10                Median :2011  
##  Mean   :13.03                Mean   :2011  
##  3rd Qu.:16.20                3rd Qu.:2014  
##  Max.   :37.90                Max.   :2017

Interestingly, most of the deaths occured in Texas, California and Florida, when you look at the maximum deaths per 100,000 Population, Wyoming, Mississippi, and Montana has the highest rates. This could be due to the fact that Texas, California and Florida are large states and so have a high population which would be the reason why the high death numbers were weighted out due to the population size.

We didn’t recreate all the 5 design sheets, however we did create an idea sheet and sketches for the new dataset.

Ideas Sheet #1

Ideas Sheet #1

Contributions

5/9/2019

All members contributed equally to the design and production processes. Most of the design and coding was performed together with all group members present, therefore assessing specific contributions of each member is difficult. It should be noted, however, that Maria Nakhoul championed the debugging efforts on many of the visualizations, and all team members agree that she deserves recognition for her efforts and success there. Discussion regarding complex debugging and cleaning of the code also happened together.

Screenshots and Captions

5/9/19

The Navigation Bar in our visualization that helps us switch between the different tabs in our visualization.

Screenshots #1

Screenshots #1

The first tab of our visualization focuses on the CDC data for Morto Vehicle Death Rates in 2012 and 2014. CDC data was stratified by Age, Gender and Year, so we were able to create boxplot distributions for the data to visualize the differences between 2012 and 2014. The bar plot represents death rate per 100,000 Population in all the states in increasing order for a specific year. The choropleth map is a visualization of the death rate portrayed in the bar plot.

Screenshots #2

Screenshots #2

You can click on a boxplot and it will change in color to be highlighted and filter the data based on your selection and even update the side bar panel. Under the boxplots, a label of the boxplot you chosen gets displayed. If you click on a bar in the barplot it will highlight the corresponding state in the choropleth map and vice versa.

Screenshots #3

Screenshots #3

The second tab of our visualization is the comparison between a reference year choropleth map and a selected map from the small multipled for the Highway Loss Data Institue dataset for the death rates per 100,000 population per state for the years 2005-2017. The small multiples help to visualize the data all at once and make comparions. Under the small multiples, there is a label to select a small multiple to enlarge. After enlarging, it will tell you the year you have chosen to remind you.

Screenshots #4

Screenshots #4

Once you choose a reference year and a small multiple, the 2 choropleth maps appear side by side for comparison. Reference on the left and small multiple on the right. When you hover over the states in either map, or even the small multiples above, you get a bar plot of the Highway Loss Data Institue dataset on the vehicles involved in the death accident for the years 2005-2017. The death total in the 2 datasets is the same so we were able to make this link between the datasets for this visualization.

Screenshots #5

Screenshots #5

The third tab was the differences in the death rates between the years to a reference year. You choose a reference year and then the differences between all the years and the reference year (!reference-reference). The blue indicates negative values while the red indicates positive values. Ofcrouse one of the maps will remain red because we didn’t subtract the reference from itself. We were unable to add a colorscale because due to the recursive function creating our small multiples, the colorscales would come up stacked ontop of each other and overlapped so you can’t read the values displayed. The line graph represents the total number of deaths across all states across all the years, and when a reference year is chosen, the point for that year turns red.

Screenshots #6

Screenshots #6

The fourth tab shows the correlation scatter plot between percentage of seatbelt usage in the states for a reference year and the death rates for that year. The bar plot represents the death count from driving under the influence for a reference year by state. The line graph shows the total deaths caused by driving under the influence in the United States for the years 2009-2017. This tab was created as a way to find some connection between driving under the influence and death numbers even when using seat belts.

Screenshots #7

Screenshots #7

When you choose a reference year, that specific year point lights up in the line graph. The scatter plot and bar plot are filtered for that year as well. When you click on a bar in the bar plot, the corresponding point in the scatter plot appears as well.

Screenshots #8

Screenshots #8

Future Work

5/9/19

Future work could look further into the unexpected correlation plots, mapping color to other potential contributors to motor-vehicle death rates: impaired driving rates, road conditions, etc. Would be interesting to find more data collected by FARS that hasn’t been presented to the public, or even if there is stratification of the data the way the CDC had their data stratified by Age, Gender, and Year.

References